An information-based sequence distance and its application to whole mitochondrial genome phylogeny

نویسندگان

  • Ming Li
  • Jonathan H. Badger
  • Xin Chen
  • Sam Kwong
  • Paul E. Kearney
  • Haoyong Zhang
چکیده

MOTIVATION Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. RESULTS We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. AVAILABILITY The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequencing and Molecular Analysis of ATP 6 and ATP 8 of Mitochondrial Genome in Khorasanian Native Chickens

In order to perform breeding programs and improve production of native chickens, preserving genetic diversity in different areas of Iran is important due to the reduced available population. Genome sequencing is considered the most functional approach to determine the phylogeny relation between close populations. The aim of the present study was the evaluation of the phylogeny and genetic nucle...

متن کامل

Appraisal of the entire mitochondrial genome for DNA barcoding in birds

DNA barcoding based on a standardized region of 648 base pairs of mitochondrial DNAsequences from Cytochrome C Oxidase 1 (COX1) is proposed for animal species identification.Recent studies suggested that DNA barcoding has been effective for identifying 94% of birdspecies. The proposed threshold of 10 times the average intraspecific variation could be used forthe identification and delimitation ...

متن کامل

Normalized Information Distance and Whole Mitochondrial Genome Phylogeny Analysis

A new class of similarity measures aimed at measuring the evolutionary relation of sequences is studied. A prime example is the “normalized information distance”, based on the noncomputable notion of Kolmogorov complexity. We demonstrate that it is a metric, takes values in [0, 1], and is universal. To apply it (and some related metrics) we use a simple approximation scheme to computationally c...

متن کامل

Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis.

Previous molecular phylogeny algorithms mainly rely onmulti-sequence alignments of cautiously selected characteristic sequences,thus not directly appropriate for whole genome phylogeny where eventssuch as rearrangements make full-length alignments impossible. Weintroduce here the concept of Complete Information Set (CIS) and itsmeasurement implementation as evolution distance without reference ...

متن کامل

Molecular phylogeny of some avian species using Cytochrome b gene sequence analysis

Veritable identification and differentiation of avian species is a vital step in conservative, taxonomic, forensic, legal and other ornithological interventions. Therefore, this study involved the application of molecular approach to identify some avian species i.e. Chicken (Gallus gallus), Muskovy duck (Cairina moschata), Japanese quail (Coturnix japonica), Laughing dove (Streptopelia senegale...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 17 2  شماره 

صفحات  -

تاریخ انتشار 2001